Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)#1321
Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)#1321anthony-maio wants to merge 3 commits intoopenai:mainfrom
Conversation
3-seed: 1337=0.7450, 42=0.7350, 2024=0.7416. All under 16MB. Same model as openai#1313, only SLOT_STEPS increased 24->48. Eval time 409s, within 10-min budget.
There was a problem hiding this comment.
Pull request overview
Adds a new 10min/16mb record entry for SLOT-48 evaluation-time tuning, reporting a 3-seed mean val_bpb of 0.7406 with artifacts under 16MB.
Changes:
- Introduces a new record folder with the training/eval script (
train_gpt.py) configured for SLOT_STEPS=48 by default. - Adds per-seed training logs and a
submission.jsonsummarizing 3-seed results/metadata. - Adds a README documenting results, deltas vs prior SLOT-24, and reproduction instructions.
Reviewed changes
Copilot reviewed 3 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_gpt.py | Training + eval script for the SLOT-48 record run (incl. SLOT eval path). |
| records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed42.log | Seed 42 training/eval log used as evidence for reported metrics. |
| records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed2024.log | Seed 2024 training/eval log used as evidence for reported metrics. |
| records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed1337.log | Seed 1337 training/eval log used as evidence for reported metrics. |
| records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/submission.json | Machine-readable result summary for the record submission. |
| records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/README.md | Human-readable summary of results, changes vs prior PRs, and reproduction steps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983}, | ||
| "42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595}, | ||
| "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375} |
There was a problem hiding this comment.
The steps values in seed_results don’t match the actual stop steps shown in the corresponding train_seed*.log files (e.g., seed 42 stops at step 6576, seed 2024 at 6588, seed 1337 at 6578). Please update the JSON to reflect the logged training steps (or clarify what steps represents if it’s intentionally different).
| "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983}, | |
| "42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595}, | |
| "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375} | |
| "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6578, "artifact_bytes": 15815983}, | |
| "42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6576, "artifact_bytes": 15751595}, | |
| "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6588, "artifact_bytes": 15793375} |
| | 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 | | ||
| | 42 | 1.121 | **0.7350** | 6563 | 15,751,595 | | ||
| | 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 | |
There was a problem hiding this comment.
The README’s “Steps” column doesn’t match the actual training stop steps in the included logs (e.g., seed 42 stops at 6576 in train_seed42.log, seed 2024 at 6588, seed 1337 at 6578). Please update the table so the reported step counts are consistent with the logs.
| | 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 | | |
| | 42 | 1.121 | **0.7350** | 6563 | 15,751,595 | | |
| | 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 | | |
| | 1337 | 1.126 | **0.7450** | 6578 | 15,815,983 | | |
| | 42 | 1.121 | **0.7350** | 6576 | 15,751,595 | | |
| | 2024 | 1.122 | **0.7416** | 6588 | 15,793,375 | |
| num_layers_total = max( | ||
| (int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")), | ||
| default=0, | ||
| ) + 1 | ||
|
|
There was a problem hiding this comment.
num_layers_total is computed here but never used, which makes the quantization path harder to read/maintain. Please remove it (or use it if it’s intended for validation/metadata).
| num_layers_total = max( | |
| (int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")), | |
| default=0, | |
| ) + 1 |
Summary
3-Seed Results
Beats merged SOTA (1.1147) by 0.374 BPB. Beats best pending (#1229, 0.9300) by 0.190 BPB.
What Changed vs PR #1313 (0.8637)
One parameter:
SLOT_STEPSincreased from 24 to 48. Same model, same training, same architecture.SLOT Scaling (same model, different step counts)
SLOT-48 Details
Compliance
Reproduction
Training: ~600s. Eval: ~409s. Total: ~17 min.
Credits